INDEX
Explanations
phrases related to responses to various stimuli or conditions
New Auto-Interp
Negative Logits
houſe
-0.91
ſelves
-0.91
ſelf
-0.83
ⓧ
-0.82
Normdatei
-0.80
Diſ
-0.79
ſmall
-0.79
ьаж
-0.79
ſta
-0.78
phalt
-0.78
POSITIVE LOGITS
propria
0.52
of
0.52
AIR
0.51
rada
0.50
Rujukan
0.49
près
0.49
propertyName
0.49
populaire
0.49
alongside
0.48
pios
0.48
Activations Density 0.075%