INDEX
Explanations
references to mice and rats
mice, rats, rodents
mouse and rodent
New Auto-Interp
Negative Logits
ویکیپدی
-0.76
#
-0.68
kasarigan
-0.67
/***/
-0.55
="@+
-0.51
thăng
-0.50
šana
-0.50
illoma
-0.49
іга
-0.48
bardier
-0.48
POSITIVE LOGITS
mouse
1.02
Mouse
1.02
Mice
0.98
mice
0.98
mouse
0.97
rodents
0.95
MOUSE
0.94
🐁
0.90
🐭
0.88
rodent
0.87
Activations Density 0.183%