INDEX
Explanations
phrases related to evaluation or exploration
phrases indicating objectives or methods for acquiring knowledge and understanding
New Auto-Interp
Negative Logits
ãĥIJ
-0.66
squares
-0.64
towels
-0.62
feces
-0.60
benches
-0.58
drowned
-0.58
veins
-0.58
Croat
-0.57
ãģ®ç
-0.56
peas
-0.56
POSITIVE LOGITS
INESS
0.69
iona
0.68
ocious
0.66
uce
0.65
Higher
0.65
umar
0.65
ocard
0.63
ocation
0.63
FK
0.63
Ratio
0.62
Activations Density 0.336%